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A.  INTRODUCTION 

The  overall  goal  of  this  STTR  program  is  to  formulate  perception-related  integration  rules 
over  time  and  frequency  -  presumably  realized  at  post  Auditory  Nerve  (AN)  layers  -  in  the 
context  of  speech  perception  in  the  presence  of  environmental  noise.  In  particular,  we  aim  at 
developing  models  of  auditory  processing  capable  of  predicting  phonetic  confusions  by 
normally-hearing  listeners,  under  a  variety  of  acoustic  distortions.  A  prerequisite  is  to  formulate 
the  signal  processing  principles  realized  by  the  auditory  system  in  providing  the  observed 
graceful  degradation  of  human  performance  in  noise. 

Towards  this  end  we  suggest  to  model  two  interleaving  functions:  (1)  the  role  of  the 
descending  pathway  in  regulating  the  operating  point  of  the  cochlea,  resulting  in  AN 
representation  of  speech  sounds  that  are  less  sensitive  to  changes  in  environmental  conditions, 
and  (2)  the  role  of  post- AN  functions  in  extracting  important  acoustic-phonemic  cues  from  the 
AN  firing  patterns.  The  underlying  assumption  is  that  the  regulating  mechanism  and  the  post-AN 
mechanisms  work  in  concert.  Current  models  of  the  periphery  are  based  upon  the  ascending 
pathway  up  through  the  AN.  We  propose  to  utilize  the  role  of  the  descending  pathway,  mainly 
the  Medial  Olivocochlear  (MOC1)  feedback  mechanism,  and  the  way  the  ascending  and  the 
descending  pathways  interact.  As  a  case  study  we  shall  focus  on  processing  of  speech  in  the 
presence  of  additive  speech-shaped  noise.  It  is  suggested  that  the  cochlear  response  in  the 
presence  of  background  noise  is  (much)  more  stable  than  the  output  from  current  feed-forward 
models.  This  observation  is  based  upon  the  physiological  and  psychophysical  evidence  we 
currently  have  about  the  possible  role  of  the  MOC  efferent  system  (see  summary  in  Sec.  B.  of 
the  report).  To  model  functions  of  post-AN  processing  we  propose  a  psychophysically  based 
approach.  The  post-AN  functions  will  be  modeled  as  a  template-matching  system,  where  a  time- 
frequency  input  pattern  is  matched  against  internal  templates  using  a  psychophysically  derived 
distance  measure.  We  suggest  that  the  success  of  post-AN  mechanisms  in  reliably  extracting 
speech-related  information  in  noise  is  partly  due  to  the  “stabilizing”  effect  of  the  efferent  system. 

This  report  summarizes  work  that  has  been  completed  in  Phase  I  of  the  STTR  program.  We 
have  implemented  a  closed-loop  model  of  the  auditory  periphery  with  efferent-inspired  feedback 
and  have  demonstrated  its  ability  to  produce  spectrograms  of  noisy  speech  samples  that  are  more 
consistent  with  spectrograms  of  speech  in  quiet  than  are  spectrograms  produced  by  open-loop 
models  of  the  auditory  periphery.  As  a  baseline  system  we  used  a  model  of  an  open-loop  linear 
cochlea  whose  details  are  described  in  Sec.  C.l.  In  Sec.  C.2.  we  compare  the  performance  of  the 
baseline  system  with  that  of  a  model  of  an  open-loop  nonlinear  cochlea.  In  Sec.  C.3  we  introduce 
a  model  of  closed-loop  nonlinear  cochlea  with  an  efferent-inspired  feedback. 

The  output  of  each  model  was  defined  as  the  temporal  response  of  the  simulated  Inner  Hair 
Cell  (IHC)  array,  organized  in  the  form  of  spectrograms.  The  output  of  the  closed-loop  model 
was  compared  quantitatively  with  the  output  of  the  baseline  open-loop  model.  The  criterion  for 
comparison  was  the  amount  of  consistency  between  the  spectrographic  representation  of  noisy 
speech  segments  and  that  of  the  corresponding  speech  signals  in  quiet.  Consistency  was 
measured  in  terms  of  the  distance  between  the  noisy  representations  (with  noise-intensity  and 


1  The  origin  of  the  MOC  nerve  bundle  is  in  the  medial  region  of  the  superior  olive,  and  it  projects  back  to 
different  places  along  the  cochlea  partition  in  a  tonotopical  manner,  making  synapse  connections  to  the 
outer-hair  cells.  Detailed  description  is  provided  in  Sec.  B. 
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SNR  as  parameters)  and  the  representations  of  the  speech  in  quiet  (the  reference).  Sec.  D. 
presents  the  quantitative  evaluation.  It  shows  that  the  closed-loop  auditory  model  produces 
representations  that  are  far  more  stable  compared  to  those  produced  by  the  baseline  (open-loop) 
auditory  model.  Whether  this  model  of  auditory  periphery  preserves  phonetic  information  in 
patterns  that  follow  psychophysical  patterns  will  be  rigorously  inspected  during  Phase  II,  where 
the  central  part  of  the  proposal,  i.e.  the  formulation  of  a  perception-based  distance  measure,  will 
be  established. 

B.  MOC  EFFERENTS  -  BREIF  REVIEW 
B.l  MOC  efferents:  morphology  and  physiology 

Numerous  papers  have  been  published  providing  detailed  morphological  and 
neurophysiological  description  of  the  medial  olivocochlear  (MOC)  efferent  feedback  system 
(e.g.,  Guinan,  1996;  May  and  Sachs,  1992;  Winslow  and  Sachs,  1988).  MOC  efferents  originate 
from  neurons  medial,  ventral  and  anterior  to  the  medial  superior  olivary  nucleus  (MSO),  have 
myelinated  axons,  and  terminate  directly  on  Outer  Hair  Cells  (OHC).  Medial  efferents  project 
predominantly  to  the  contralateral  cochlea,  the  innervation  is  largest  near  the  center  of  the 
cochlea,  with  the  crossed  innervation  biased  toward  the  base  compared  to  the  uncrossed 
innervation  (e.g.,  Guinan,  1996).  Roughly  two-third  of  medial  efferents  respond  to  ipsilateral 
sound,  one-third  to  contralateral  sound,  and  a  small  fraction  to  sound  in  either  ear.  Medial 
efferents  have  tuning  curves  that  are  similar  to,  or  slightly  wider  than,  those  of  AN  fibers,  and 
they  project  to  different  places  along  the  cochlear  partition  in  a  tonotopical  manner.  Finally, 
medial  efferents  have  longer  latencies  and  group  delays  than  AN  fibers.  In  response  to  tone  or 
noise  bursts,  most  MOC  efferents  have  latencies  of  10-40ms.  Group  delays  measured  from 
modulation  transfer  functions  are  much  more  tightly  clustered,  averaged  at  about  8ms.  We 
currently  do  not  have  a  clear  understanding  of  the  functional  role  of  this  mechanism.  Few 
suggestions  have  been  offered,  such  as  shifting  of  sound-level  functions  to  higher  sound  levels, 
antimasking  effect  on  responses  to  transient  sounds  in  a  continuous  masker,  preventing  damage 
due  to  intense  sound  (e.g.,  Guinan,  1996).  One  speculated  role,  which  is  of  particular  interest  for 
this  proposal,  is  a  dynamic  regulation  of  the  cochlear  operating  point  depending  on  background 
acoustic  stimulation,  resulting  in  robust  human  performance  in  perceiving  speech  in  a  noisy 
background.  There  are  a  few  neurophysiologcal  studies  to  support  this  role.  Using  anesthetized 
cats  with  noisy  acoustic  stimuli,  Winslow  and  Sachs  (1988),  for  example,  showed  that  by 
stimulating  the  MOC  nerve  bundle  electrically,  the  dynamic  range  of  discharge  rate  at  the  AN  is 
partly  recovered.  Measuring  neural  responses  of  awake  cats  to  noisy  acoustic  stimuli,  May  and 
Sachs  (1992)  showed  that  the  dynamic  range  of  discharge  rate  at  the  AN  level  is  only  moderately 
affected  by  changes  in  levels  of  background  noise. 

B.2  MOC  efferents:  psychophysics  -  speech  and  speech-like  stimuli 

Few  behavioral  studies  indicate  the  potential  role  of  the  MOC  efferent  system  in  perceiving 
speech  in  the  presence  of  background  noise.  Dewson  (1968)  presented  evidence  that  MOC 
lesions  impair  the  abilities  of  monkeys  to  discriminate  the  vowel  sounds  [i]  and  [u]  in  the 
presence  of  masking  noise  but  have  no  effect  on  the  performance  of  this  task  in  quiet.  More 
recently,  Giraud  et  al.  (1997),  and  Zeng  et  al.  (2000)  showed  that  the  performance  of  human 
subjects  after  they  undergo  a  vestibular  neurectomy  (presumably  resuling  in  a  severed  MOC 
feedback)  deteriorates  phoneme  perception  when  the  speech  is  presented  in  a  noisy  background. 
These  speech  reception  experiments,  however,  provide  questionable  evidence  because  of  surgical 
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side  effects  such  as  uncertainties  about  the  extent  of  the  lesion  and  possible  damage  to  cochlear 
elements.  Recently,  Ghitza  (2004)  quantified  the  role  of  the  MOC  efferent  system  by  performing 
a  test  of  initial  consonant  reception  (the  Diagnostic  Rhyme  Test)  using  subjects  with  normal 
hearing.  Activation  of  selected  parts  of  the  efferent  system  was  attempted  by  presenting  speech 
and  noise  in  various  configurations  (gated/continuous,  monaural/binaural).  Initial  results  of  these 
experiments  show  a  gated/continuous  difference  analogous  to  the  'masking  overshoot'  in  tone 
detection.  These  results  are  interpreted  to  support  the  hypothesis  of  a  significant  efferent 
contribution  to  initial  phone  discrimination  in  noise. 

B. 3  Summary 

Mounting  physiological  data  exists  in  support  of  the  effect  of  MOC  efferents  on  the 
mechanical  properties  of  the  cochlea  and,  in  turn,  on  the  enhancement  of  signal  properties  at  the 
auditory  nerve  level,  in  particular  when  the  signal  is  embedded  in  noise.  The  current  theory  on 
the  role  of  MOC  efferents  in  hearing  is  that  they  cause  a  reduction  in  OHC  motility  and  shape 
that  results  in  increased  basilar  membrane  stiffhess  which  in  turn  produces  an  inhibited  IHC 
response  in  the  presence  of  noise  that  is  comparable  to  the  IHC  response  produced  by  a  noiseless 
environment.  We  develop  this  popular  theory  into  a  closed-loop  model  of  the  peripheral 
auditory  model  that  adaptively  adjusts  its  cochlear  operating  point  such  that  the  time-frequency 
IHC  rate  responses  are  more  consistent  over  clean  and  noisy  conditions  than  state-of-the-art 
open-loop  systems  that  neglect  efferent  feedback. 

C.  PHASE  I -MODEL  DEVELOPMENT 

The  overall  goal  of  Phase  I  was  to  develop  a  closed-loop  model  of  the  auditory  periphery  that 
incorporates  the  human  efferent  system  and  to  demonstrate  the  ability  of  such  a  model  to 
produce  displays  of  noisy  speech  that  are  more  consistent  with  displays  of  speech  in  quiet  than 
are  displays  produced  by  open-loop  models.  In  embarking  on  this  endeavor,  we  tested  different 
models  of  cochlear  filters,  linear  [Gammatone  filters  (Patterson,  1995)]  as  well  as  nonlinear 
[MBPNL  (Goldstein,  1990)]. 

In  implementing  a  cochlear  model  we  use  a  bank  of  overlapping  cochlear  channels  uniformly 
distributed  along  the  ERB  scale  (Moore  and  Glasberg,  1983),  four  channels  per  ERB.  Each 
cochlear  channel  comprises  a  filter  (Gammatone  or  MBPNL)  followed  by  a  generic  model  of  the 
IHC  (half-wave  rectification  followed  by  a  low-pass  filter,  representing  the  reduction  of 
synchrony  with  CF).  The  dynamic  range  of  the  simulated  IHC  response  is  restricted  -  from 
below  and  above  -  to  a  “dynamic-range  window”  (DRW),  representing  the  observed  dynamic 
range  at  the  AN  level  (i.e.  the  AN  rate-intensity  function);  the  lower  bound  and  upper  bound  of 
the  DRW  stand  for  the  spontaneous  rate  and  rate-saturation,  respectively. 

C.l.  Linear  cochlear  model  with  Gammatone  filters 

A  linear  Gammatone  filter  bank,  which  represents  a  linear  based  filtering  strategy,  was  first 
examined  as  a  baseline.  Displays  of  the  simulated  IHC  response  were  examined  for  noise 
intensity  levels  of  70, 60,  and  50dB_SPL  and  for  SNR  values  of  20, 10,  and  5dB.  Figure  2 
provides  a  spectrographic  example.  The  figure  contains  a  3-by-3  matrix  of  images;  the  abscissa 
represents  the  intensity  of  the  background  noise,  in  dB  _SPL.  The  ordinate  represents  SNR,  in 
dB.  Each  image  represents  the  simulated  IHC  responses  to  the  diphone  s_a  (duration  of  249ms) 
spoken  by  a  male  speaker.  Figure  2  depicts  the  simulated  open-loop  Gammatone  IHC  response, 
with  DRW=40dB.  The  position  of  the  DRW  was  set  such  that  speech  is  visible  for  the 
50dB_SPLxSNR=5dB  condition.  Upper  bound  of  the  DRW  was  chosen  such  that 
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70dB_SPLxSNR=20dB  condition  is  not  oversaturated.  A  large  inconsistency  is  observed  across 
varying  noise  intensity  and  SNR  levels.  Note  that  for  the  DRW  we  chose,  at  50dB_SPL  noise 
intensity  level  much  of  the  speech  energy  is  not  present  in  the  simulated  IHC  response  for.  Had 
the  DRW  range  been  shifted  lower,  more  of  the  speech  energy  of  the  50dB_SPL  noise  intensity 
level  would  have  been  visible  but  also  much  noise. 

C.2.  Open-loop  nonlinear  cochlear  model 

A  second  model  that  we  examined  was  Goldstein’s  Multi  Band  Pass  Non  Linear  (MBPNL) 
model  of  nonlinear  cochlear  mechanics  (Goldstein,  1990).  This  model  operates  in  the  time 
domain  and  changes  its  gain  and  bandwidth  with  changes  in  the  input  intensity,  in  accordance 
with  observed  physiological  and  psychophysical  behavior.  The  MBPNL  model  is  shown  in 
figure  3.  The  lower  path  (H1/H2)  is  a  compressive  nonlinear  filter  that  represents  the  sensitive, 
narrowband  compressive  nonlinearity  at  the  tip  of  the  basilar  membrane  tuning  curves.  The 
upper  path  (H3/H2)  is  a  linear  filter  (expanding  function  preceded  by  its  inverse  results  in  a 
unitary  transformation)  that  represents  the  insensitive,  broadband  linear  tail  response  of  basilar- 
membrane  tuning  curves  (after  Goldstein,  1990).  The  parameter  G  controls  the  gain  of  the  tip  of 
the  basilar  membrane  tuning  curves,  and  is  used  to  model  the  inhibitory  efferent-induced 
response  in  the  presence  of  noise  (see  Sec.  C.3.  below).  For  the  open-loop  MBPNL  model  the 
tip  gain  is  set  to  G=40dB,  to  best  mimic  psychophysical  tuning  curves  of  a  healthy  cochlea  in 
quiet  (Goldstein,  1990). 

The  "iso-input"  frequency  response  of  an  MBPNL  filter  at  CF  of  3400Hz  is  shown  in  figure  4. 
The  frequency  response  for  the  open-loop  MBPNL  model  is  shown  at  the  upper-left  comer  (i.e. 
for  G=40dB).  For  an  input  signal  s(t)=^sin(<u0t),  with  A  and  co0  fixed,  the  MBPNL  behaves  as  a 
linear  system  with  a  fixed  "operating  point"  on  the  expanding  and  compressive  nonlinear  curves, 
determined  by  A.  Figure  4  shows  the  iso-input  frequency  response  of  the  system  for  different 
values  of  A.  For  a  given  A,  a  discrete  "chirp"  signal  was  presented  to  the  MBPNL,  with  a  slowly 
changing  frequency.  Changes  in  co0  occurred  only  after  the  system  reached  steady-state,  for  a 
proper  gain  measurement.  For  a  OdB  input  level  A= 1,  the  gain  at  CF  is  approximately  40dB.  As 
the  input  level  increases  the  gain  drops  and  the  bandwidth  increases,  in  accordance  with 
physiological  and  psycho-physical  behavior. 

Figure  5  shows  the  simulated  IHC  response  generated  by  the  open-loop  MBPNL  to  the 
diphone  s_a  (same  as  in  Fig.  2)  for  noise  intensity  levels  of  70,  60,  and  50dB_SPL  and  for  SNR 
values  of  20,  10,  and  5dB.  The  tip-gain  is  set  to  G=40dB  and  held  constant  for  all  SNR  and 
noise  levels.  Here,  we  set  DRW=22dB  (down  from  40dB  for  the  Gammatone)  because  of  the 
reduction  in  the  overall  dynamic  range  at  the  MBPNL  output  due  to  its  inherent  nonlinearity.  The 
position  of  the  DRW  was  chosen  such  that  the  speech  energy  of  the  simulated  IHC  response  for 
the  7 0dB_SPL x  SNR=5  dB  condition  matched  that  of  the  same  condition  of  the  Gammatone 
model.  Like  the  displays  produced  by  the  Gammatone  model,  the  open-loop  MBPNL  displays 
show  a  large  inconsistency  across  varying  noise  levels.  Notice  that  for  both  open-loop  models 
(Gammatone-  and  MBPNL-  based)  we  could  not  find  a  “sweet-spot”  for  the  DRW  position  that 
will  provide  a  consistent  display  at  the  output,  across  rows  and  columns. 

C.3.  Cochlear  model  with  efferent-inspired  feedback 

From  the  open-loop  MBPNL  model,  we  developed  a  closed-loop  MBPNL  model  that  includes 
an  efferent-inspired  feedback  mechanism.  Morphologically  (e.g.  Guinan,  1996),  MOC  neurons 
project  to  different  places  along  the  cochlea  partition  in  a  tonotopical  manner,  making  synapse 
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connections  to  the  outer-hair  cells  and,  hence,  affecting  the  mechanical  properties  of  the  cochlea 
(e.g.  increase  in  basilar-membrane  stiffness).  Therefore,  we  introduce  a  frequency  dependent 
feedback  mechanism  which  controls  the  tip-gain  (G)  of  each  MBPNL  channel  according  to  the 
intensity  level  of  sustained  noise  at  that  frequency  band.  As  shown  in  Fig.  4,  the  upper-left  panel 
represents  the  nominal  response  (i.e.  healthy  cochlea,  in  quiet),  with  the  tip-gain  G=40dB.  By 
reducing  G,  the  MBPNL  response  to  weaker  stimuli  (e.g.  background  noise)  is  controlled.  The 
lower  right  panel,  for  example,  shows  the  MBPNL  response  for  G=10dB.  For  high  energy  tone 
stimuli  the  MBPNL  response  is  hardly  affected,  while  the  response  for  low  energy  stimuli  (e.g.  - 
80dB  Re  maximum  input  range)  is  reduced  by  some  30dB.  In  our  efferent-inspired  model,  G  is 
adjusted  such  that  the  average  power  of  the  cochlear  output,  in  response  to  background  noise  at 
the  input,  will  be  such  that  the  simulated  IHC  response  to  noise  will  be  kept  just  below  the  lower 
bound  of  the  DRW. 

Figure  6  depicts  the  simulated  IHC  response  of  an  intermediate  version  of  our  closed-loop 
MBPNL  model.  DRW=22dB,  its  position  is  fixed  at  the  same  location  as  in  the  open-loop 
MBPNL  model.  The  value  of  the  tip  gain  (G)  per  cochlear  channel  is  adjusted  using  the  average 
power  per  frequency  band,  computed  over  300ms  duration  of  a  speech-shaped  noise  preceding 
the  speech  signal.  Due  to  the  nature  of  the  noise-responsive  feedback,  display  of  background 
noise  is  largely  eliminated  for  all  dB_SPL*SNR  conditions.  At  a  given  SNR,  displays  of 
processed  noisy  speech  are  consistent  across  db  SPL  noise  level  (rows  in  Fig.  6).  As  expected,  at 
a  fixed  dB_SPL  level,  as  the  SNR  drops  (i.e.  as  the  speech  energy  drops)  the  intensity  of  speech 
information  in  the  spectrographic  display  dims  (columns  in  Fig.  6). 

Figure  8  shows  the  spectrographic  displays  of  our  current  closed-loop  MBPNL  model,  were 
the  output  of  each  MBPNL  channel  is  normalized  to  a  fixed  dynamic  range.  The  rational  behind 
the  normalization  at  the  output  stems  from  neurophysiological  studies  on  anesthetized  cats  with 
noisy  acoustic  stimuli,  which  show  that  by  stimulating  the  MOC  nerve  bundle  electrically,  the 
dynamic  range  of  discharge  rate  at  the  AN  is  recovered  (e.g.  Winslow  and  Sachs,  1988)2,  as  is 
illustrated  in  Fig.  7.  Upon  visual  examination,  it  can  easily  be  seen  that  the  displays  are  even 
more  consistent  across  dB_SPL><SNR  conditions  than  those  of  Fig.  6. 

D.  QUANTITATIVE  EVALUATION 

To  obtain  quantitative  results,  96  processed  noisy  diphone  pairs  were  compared  in  a  simulated 
2  alternative  forced  choice  DRT  test.  Tests  were  run  on  the  outputs  of  the  open-loop  Gammatone 
and  the  efferent-inspired  closed-loop  MBPNL  models,  after  temporal  smoothing.  Template 
“states”  were  chosen  for  each  DRT  diphone-pair.  In  this  study,  the  template  states  were  the 
processed  diphones  at  the  70dB_SPLxSNR=10dB  condition  (top  two  panels  in  figure  9).  The 
test  stimuli  were  the  same  diphone  tokens  in  different  noise  intensity  levels  and  different  values 
of  SNR.  For  a  given  test  token  the  MSE  distance  between  the  selected  test  token  and  the  two 
template  states  was  computed.  The  state  template  with  the  smaller  MSE  distance  from  the  test 
token  was  selected  as  the  simulated  DRT  response.  Figure  10  shows  the  average  percent  correct 
responses  as  a  function  of  noise  intensity  level  for  the  open-loop  Gammatone  (+)  and  the  closed- 
loop  MBPNL  (x).  Average  is  over  all  DRT  words  and  all  SNR  values.  As  the  plot  indicates,  the 
closed-loop  MBPNL  model  behaved  more  consistently  over  all  noise  intensity  levels  than  the 


2  Concurring  with  this  observation  are  measurements  of  neural  responses  of  awake  cats  to  noisy  acoustic 
stimuli,  showing  that  the  dynamic  range  of  discharge  rate  at  the  AN  level  is  hardly  affected  by  changes  in 
levels  of  background  noise  (May  and  Sachs,  1 992). 
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open-loop  system.  The  performance  of  the  open-loop  system  significantly  degraded  as  the  noise 
intensity  level  varied  further  from  the  template  noise  intensity  level  (70dB_SPL  in  this  example). 
Figure  1 1  shows  a  more  detailed  version  of  Fig.  10;  errors  -  averaged  is  over  all  DRT  words  - 
are  plotted  as  a  function  of  SNR,  with  noise  intensity  (in  dB_SPL)  as  a  parameter.  For  the  open- 
loop  model  best  performance  occurs  at  70dB_SPL  -  the  template  noise  condition  (as  expected, 
no  errors  occur  at  SNR=10dB  -  the  template  SNR).  The  extent  of  inconsistency  is  reflected  by 
the  poor  (close  to  chance)  performance  at  all  other  noise  intensities,  for  all  SNR  values  (an 
unexplained  exception  is  the  60dB_SPLxSNR=20dB  condition).  In  contrast,  performance  with 
the  closed-loop  MBPNL  model  is  very  consistent  across  all  conditions.  Figure  12  is  yet  another 
way  of  looking  at  the  same  data;  here,  errors  are  plotted  as  a  function  of  noise  intensity,  with 
SNR  as  the  parameter.  Similar  conclusion  can  be  drawn,  i.e.  for  the  open-loop  model,  for  each 
SNR  best  performance  occurs  at  70dB_SPL  (the  template  noise  condition);  at  all  other  noise 
intensity  levels  performance  is  close  to  chance.  Far  fewer  errors  are  made  when  the  closed-loop 
model  is  used;  most  the  errors  are  in  noise  intensity  levels  away  from  the  template  noise 
condition. 

E.  SUMMARY 

This  report  summarizes  work  that  has  been  completed  in  Phase  I  of  the  STTR  program.  We 
have  implemented  a  closed-loop  model  of  the  auditory  periphery  with  an  efferent-inspired 
feedback  and  have  quantitatively  demonstrated  its  ability  to  produce  spectrograms  of  noisy 
speech  samples  that  are  far  more  consistent  with  spectrograms  of  speech  in  quiet  than  are 
spectrograms  produced  by  an  open-loop  model  of  the  auditory  periphery.  This  increase  in 
performance  in  noise  and  increased  robustness  mimics  the  general  observed  behavior  of  humans. 
Whether  this  model  of  auditory  periphery  preserves  phonetic  information  in  patterns  that  follow 
psychophysical  patterns  will  be  rigorously  inspected  during  Phase  II,  where  the  central  part  of 
the  proposal,  i.e.  the  formulation  of  a  perception-based  distance  measure,  will  be  established. 
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Figure  1.  A  schematic  description  of  our  conceptual  model  of  perception  of  diphones 
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Figure  2.  Simulated  IHC  response  to  diphone  s_a,  produced  by  an  open-loop  Cammatone  model;  DRW=40db; 

Position  of  DRW  set  such  that  speech  is  visible  for  the  50  db  SPL  Noise  and  SNR=5db  condition.  Upper  bound  of 
DRW  chosen  such  that  70dB_SPL><SNR=20dB  condition  is  not  oversaturated. 


7 


Application  of  Cort, 


ising  and  Braida 


Figure  4.  Iso-input  frequency  responses  of  an  MBPNL  filter  (at  CF of  3400Hz)  for  different  values  of 
tip-gain,  G.  From  Upper-left,  clockwise:  G=40,  30,  20  and  1  OdB.  Upper-left  corner  (G=40dB)  is  for 
healthy  cochlea  in  quiet  (Goldstein,  1 990). 
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Figure  7.  Illustration  of  the  observed  efferent-induced  dynamic  range  recovery  of  the  discharge  rate  in  the  presence  of 
background  noise  (e.g.  Winslow  and  Sachs,  1 988).  Discharge  rate  versus  Tone  level  is  cartooned  in  quiet  condition  (full 

dynamic  range,  black);  anesthesized  cat,  i.e.  no  efferents  activity  (much  reduced  dynamic  range,  red)  and  with  electrical 
stimulation  of  COCB  nerve  bundle. 
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Figure  8.  Simulated  IHC  response  to  diphone  s_a,  produced  by  the  efferent-inspired  closed-Loop  MBPNL.  DRW 
is  same  as  in  open-loop  MBPNL  mode.  Output  of  each  MBPNL  channel  is  normalized  to  a  fixed  dynamic  range. 
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Figure  9.  Temporally  smoothed  simulated  IHC  response  produced  by  the  efferent-inspired  closed-Loop 
MBPNL  (with  normalization  at  the  output).  Representations  at  the  70dB_SPLxSNR=10dB  condition  are 
chosen  as  template  "states".  A  mimic  of  the  “one-interval  two-alternative  forced-choice”  paradigm  is 
conducted  for  each  DRT  word-pair. 


Figure  1 0.  Percent  correct  responses  as  a  function  of  noise  intensity  level  for  the  open-loop  Gammatone  (+) 

and  the  closed-loop  MBPNL  (x),  using  the  70dB_SPLxSNR=l  OdB  condition  as  template.  Average  is  over  all  DRT 
words  and  all  SNR  values. 
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Figure  11.  Same  data  as  in  Fig.  10,  in  more  details.  Errors  (in  percent)  are  averaged  over  all  DRT  words  and  plotted 
as  a  function  of  SNR,  with  noise  intensity  (in  dB_SPL)  as  a  parameter 


scbsnr  Open-loop  Gamma  imbsi* 


5dBSNR  ClOSed-IOOp  MBPNL  IQdBSNR 


ISdBSNR 


20dBSNR 


ERROR 

(Percent) 


50  60  70  '  80 

Noise  htensty  (dB  SFL) 


50  60  70  80 

Noise  htensly(dB  SFL) 


Noise  Intensity  (dB  SFL) 


20dB  SNR 


■4-44--;-- 

LHiaJ 

ti. 

Noise  htensity  (dB  SFL) 


Figure  12.  Same  data  as  in  Fig.  10,  in  more  details.  Errors  (in  percent)  are  averaged  over  all  DRT  words  and  plotted 
as  a  function  of  noise  intensity,  with  SNR  as  a  parameter. 
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